Object recognition and computer vision 2019/2020

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

Assignment 1: Instance-level recognition

Adapted from practicals from Andrea Vedaldi and Andrew Zisserman
by Gul Varol and Ignacio Rocco

</br>

STUDENT: AL Houceine KILANI

EMAIL: al_houceine.kilani@ens-paris-saclay.fr | houcinekila@gmail.com

Guidelines

The purpose of this assignment is that you get hands-on experience with the topics covered in class, which will help you understand these topics better. Therefore, it is imperative that you do this assignment yourself. No code sharing will be tolerated.

Once you have completed the assignment, you will submit the ipynb file containing both code and results. For this, make sure to run your notebook completely before submitting.

The ipynb must be named using the following format: A1_LASTNAME_Firstname.ipynb, and submitted in the class Moodle page.

Goal

The goal of instance-level recognition is to match (recognize) a specific object or scene. Examples include recognizing a specific building, such as Notre Dame, or a specific painting, such as `Starry Night’ by Van Gogh. The object is recognized despite changes in scale, camera viewpoint, illumination conditions and partial occlusion. An important application is image retrieval - starting from an image of an object of interest (the query), search through an image dataset to obtain (or retrieve) those images that contain the target object.

The goal of this assignment is to experiment and get basic practical experience with the methods that enable specific object recognition. It includes: (i) using SIFT features to obtain sparse matches between two images; (ii) using similarity co-variant detectors to cover changes in viewpoint; (iii) vector quantizing the SIFT descriptors into visual words to enable large scale retrieval; and (iv) constructing and using an image retrieval system to identify objects.

Setup environment

Download and install CyVLFeat

In [1]:
!wget -N http://www.di.ens.fr/willow/teaching/recvis/assignment1/install_cyvlfeat.py
%run install_cyvlfeat.py
--2019-10-16 13:31:41--  http://www.di.ens.fr/willow/teaching/recvis/assignment1/install_cyvlfeat.py
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/willow/teaching/recvis/assignment1/install_cyvlfeat.py [following]
--2019-10-16 13:31:42--  https://www.di.ens.fr/willow/teaching/recvis/assignment1/install_cyvlfeat.py
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/x-python]
Saving to: ‘install_cyvlfeat.py’

install_cyvlfeat.py     [ <=>                ]   3.61K  --.-KB/s    in 0s      

2019-10-16 13:31:43 (12.4 MB/s) - ‘install_cyvlfeat.py’ saved [3701]

installing cython... done!
downloading vlfeat binaries... done!
uncompressing vlfeat binaries... done!
cloning pyvlfeat repo to cyvlfeat_git folder... done!
building cyvlfeat... done!
installing cyvlfeat... done! NOTE: to uninstall run: cat cyvlfeat_files.txt | xargs rm -rf
copying vlfeat dynamic libary to a reachable location... done!
Installation of cyvlfeat is finished

Imports

In [0]:
import cyvlfeat
import numpy as np
from skimage.io import imread
from skimage.transform import resize, rotate
import matplotlib as mpl
import matplotlib.pyplot as plt
import warnings
from cyvlfeat.plot import plotframes
from scipy.io import loadmat
import numpy as np

# change some default matplotlib parameters
mpl.rcParams['axes.grid'] = False
mpl.rcParams['figure.dpi'] = 120

# ignore warnings
warnings.filterwarnings('ignore')

Download and uncompress data

In [3]:
print('downloading assignment images...')
!wget -c http://www.di.ens.fr/willow/teaching/recvis/assignment1/A1_images.tar.gz
print('done!')
print('uncompressing...')
!tar -xzf A1_images.tar.gz
print('done!')
downloading assignment images...
--2019-10-16 13:32:29--  http://www.di.ens.fr/willow/teaching/recvis/assignment1/A1_images.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/willow/teaching/recvis/assignment1/A1_images.tar.gz [following]
--2019-10-16 13:32:29--  https://www.di.ens.fr/willow/teaching/recvis/assignment1/A1_images.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘A1_images.tar.gz’

A1_images.tar.gz        [               <=>  ] 242.70M  9.04MB/s    in 30s     

2019-10-16 13:33:01 (8.02 MB/s) - ‘A1_images.tar.gz’ saved [254489201]

done!
uncompressing...
done!

Part 1: Sparse features for matching specific objects in images

Feature point detection

The SIFT feature has both a detector and a descriptor. The detector used in SIFT corresponds to the "difference of Gaussians" (DoG) detector, which is an approximation of the "Laplacian of Gaussian" (LoG) detector.

We will start by computing and visualizing the SIFT feature detections (usually called frames) for two images of the same object (a building facade). Load an image, rotate and scale it, and then display the original and transformed pair:

In [66]:
# Load an image
im1 = imread('data/oxbuild_lite/all_souls_000002.jpg')

# Let the second image be a rotated and scaled version of the first
im1prime = rotate(np.pad(im1, ((0,0), (200,200),(0,0)), 'constant'), 35, 'bilinear')
h, w, _ = im1prime.shape
im1prime = resize(im1prime, (int(0.7 * h), int(0.7 * w)))
im1prime = (255 * im1prime).astype(np.uint8)

f, (ax1, ax2) = plt.subplots(1, 2)
ax1.imshow(im1)
ax2.imshow(im1prime)
plt.show()

A SIFT frame is a circle with an orientation and is specified by four parameters: the center $x$, $y$, the scale $s$, and the rotation $\theta$ (in radians), resulting in a vector of four parameters $(x, y, s, \theta)$.

Now compute and visualise the SIFT feature detections (frames):

In [0]:
def rgb2gray(rgb):
    return np.float32(np.dot(rgb[...,:3], [0.299, 0.587, 0.114])/255.0)
In [68]:
# Compute SIFT features for each

[frames1, descrs1] = cyvlfeat.sift.sift(rgb2gray(im1), peak_thresh=0.01)

[frames1prime, descrs1prime] = cyvlfeat.sift.sift(rgb2gray(im1prime), peak_thresh=0.01)

f, (ax1, ax2) = plt.subplots(1, 2)
plt.sca(ax1)
plt.imshow(im1)
plotframes(frames1, linewidth=1)

plt.sca(ax2)
plt.imshow(im1prime)
plotframes(frames1prime, linewidth=1)

plt.show()

Examine the second image and its rotated and scaled version and convince yourself that the detections overlap the same scene regions (even though the circles' positions have moved in the image and their radius' have changed). This demonstrates that the detection process (is co-variant or equi-variant) with translations, rotations and isotropic scalings. This class of transformations is known as a similarity or equiform.

:: TASK 1.1 ::

Now repeat the exercise with a pair of natural images.

Start by loading the second one: data/oxbuild_lite/all_souls_000015.jpg

Plot the images and feature frames. Again you should see that many of the detections overlap the same scene region. Note that, while repeatability occurs for the pair of natural views, it is much better for the synthetically rotated pair.

In [69]:
# Load an image
im2 = imread('data/oxbuild_lite/all_souls_000015.jpg')

# Let the second image be a rotated and scaled version of the first
im2prime = rotate(np.pad(im2, ((0,0), (200,200),(0,0)), 'constant'), 35, 'bilinear')
h, w, _ = im1prime.shape
im2prime = resize(im2prime, (int(0.7 * h), int(0.7 * w)))
im2prime = (255 * im2prime).astype(np.uint8)

f, (ax1, ax2) = plt.subplots(1, 2)
ax1.imshow(im2)
ax2.imshow(im2prime)
plt.show()

[frames2, descrs2] = cyvlfeat.sift.sift(rgb2gray(im2), peak_thresh=0.01)

[frames2prime, descrs2prime] = cyvlfeat.sift.sift(rgb2gray(im2prime), peak_thresh=0.01)

f, (ax1, ax2) = plt.subplots(1, 2)
plt.sca(ax1)
plt.imshow(im2)
plotframes(frames2, linewidth=1)

plt.sca(ax2)
plt.imshow(im2prime)
plotframes(frames2prime, linewidth=1)

plt.show()

The number of detected features can be controlled by changing the peakThreshold option. A larger value will select features that correspond to higher contrast structures in the image.

:: TASK 1.2 ::

For the same image, produce 3 sub-figures with different values of peakThreshold. Comment.

In [70]:
peak_1th = 0.0001
peak_2th = 0.001
peak_3th = 0.01
[frames1th, _] = cyvlfeat.sift.sift(rgb2gray(im2), peak_thresh=peak_1th)
[frames2th, _] = cyvlfeat.sift.sift(rgb2gray(im2), peak_thresh=peak_2th)
[frames3th, _] = cyvlfeat.sift.sift(rgb2gray(im2), peak_thresh=peak_3th)


f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (15,7))
plt.sca(ax1)
ax1.set_title('peak threshold = {}'.format(peak_1th))
plt.imshow(im2)
plotframes(frames1th, linewidth=1)

plt.sca(ax2)
ax2.set_title('peak threshold = {}'.format(peak_2th))
plt.imshow(im2)
plotframes(frames2th, linewidth=1)

plt.sca(ax3)
ax3.set_title('peak threshold = {}'.format(peak_3th))
plt.imshow(im2)
plotframes(frames3th, linewidth=1)

plt.tight_layout()
# plt.show()

The peak threshold filters the difference of gaussians ( in successive scale spaces) that are too small ( in this case : under the specified value as a parameter). This DoG is small when there is not a big difference in contrast in the pixels concerned (26 neighbor pixels because n_levels = 3 here). So the higher the value of the threshold, the fewer are the feature points extracted and those that happen to be extracted seem to be regions with change in details (not the sky for example because it is kind of "uniform")

Feature point description and matching

Introduction to feature descriptors

The parameters $(t_x, t_y, s, \theta)$ of the detected frames, can be used to extract a scaled and oriented RGB patch around $(t_x,t_y)$, used to describe the feature point.

The simplest possible descriptor would be to (i) resize these patches to a common size (eg. 30x30) and (ii) flatten to a vector. However, in practice we use more sophisticated descriptors such as SIFT, that is based on histograms of gradient orientations.

:: TASK 1.3 ::

What is the interest of using SIFT descriptors over these flattened RGB patches?

The SIFT descriptors happen to be invariant to changes of luminosity and the viewpoints which is what we need in order to compare two different pictures (that may be of the same object but from a different angle and luminosity). Plus, the SIFT descriptor is way more "compact" ( the size of the vector will be way smaller : 128 dimensions instead of 900) compared to the other approach

Descriptor matching

SIFT descriptors are 128-dimensional vectors, and can be directly matched to find correspondences between images. We will start with the simplest matching scheme (first nearest neighbour of descriptors in terms of Euclidean distance) and then add more sophisticated methods to eliminate any mismatches.

:: TASK 1.4 ::

For each descriptor in im1, assign a matching descriptor in im2 by picking its first nearest neighbour.

Populate the second column of the matches vector.

In [0]:
# number of detections in image 1
N_frames1 = frames1.shape[0]
# allocate matrix for storing the matches
matches=np.zeros((N_frames1,2),dtype=np.int)
# the first column of the matrix are the indices on image 1: 0,1,2,....,N_frames1-1
matches[:,0]=range(N_frames1)

# write code to find the matches in image 2 of each feature in image 1 
# populate the second column of the matches vector

import sklearn.metrics.pairwise
matches[:,1] = np.argmin(sklearn.metrics.pairwise.euclidean_distances(descrs1,descrs2), axis = 1)
In [72]:
# plot matches
plt.figure()
plt.title("First NN matching of descriptors")
plt.imshow(np.concatenate((im1,im2),axis=1))
for i in range(N_frames1):
    j=matches[i,1]
    # plot dots at feature positions
    plt.gca().scatter([frames1[i,0],im1.shape[1]+frames2[j,0]], [frames1[i,1],frames2[j,1]], s=5, c='green')
    # plot lines
    plt.plot([frames1[i,0],im1.shape[1]+frames2[j,0]],[frames1[i,1],frames2[j,1]],linewidth=0.5)
plt.show()
In [73]:
# plot matches
plt.figure(figsize=(15,7))
plt.title("Some examples of mismatches")
plt.imshow(np.concatenate((im1,im2),axis=1))
for i in [100,105]:
    j=matches[i,1]
    # plot dots at feature positions
    plt.gca().scatter([frames1[i,0],im1.shape[1]+frames2[j,0]], [frames1[i,1],frames2[j,1]], s=5, c='green')
    # plot lines
    plt.plot([frames1[i,0],im1.shape[1]+frames2[j,0]],[frames1[i,1],frames2[j,1]],linewidth=0.5)
plt.show()

This first approach leads to a lot of mismatches mainly due to :

  • Matching identical patterns but in different regions (Blue line)
  • Changes in luminosity ( orange line)

Improving SIFT matching (i) using Lowe’s second nearest neighbour test

Lowe introduced a second nearest neighbour (2nd NN) test to identify, and hence remove, ambiguous matches. The idea is to identify distinctive matches by a threshold on the ratio of first to second NN distances.

The ratio is: $$NN_{ratio} = \frac{1^{st}\text{NN distance}}{2^{nd}\text{NN distance}}.$$

:: TASK 1.5 ::

For each descriptor in im1, compute the ratio between the first and second nearest neighbour distances.

Populate the ratio vector.

In [0]:
# allocate matrix for storing the matches
ratio=np.zeros((N_frames1,1))

# write code to find the 1st/2nd NN ratio for each descriptor in im1
# populate the ratio vector
def two_smallest_values_row(row):
  row_sorted = sorted(row)
  return row_sorted[:2]
M = sklearn.metrics.pairwise.euclidean_distances(descrs1,descrs2)
vals = np.apply_along_axis(two_smallest_values_row, 1, M)
ratio[:,0] = vals[:,0]/vals[:,1]
# ratio

The ratio vector will be now used to retain only the matches that are above a given threshold.

A value of $NN_{threshold} = 0.8$ is often a good compromise between losing too many matches and rejecting mismatches.

In [75]:
NN_threshold=0.8

filtered_indices = np.flatnonzero(ratio<NN_threshold)
filtered_matches = matches[filtered_indices,:]
# plot matches
plt.figure()
plt.imshow(np.concatenate((im1,im2),axis=1))
for idx in range(filtered_matches.shape[0]):
    i=filtered_matches[idx,0]
    j=filtered_matches[idx,1]
    # plot dots at feature positions
    plt.gca().scatter([frames1[i,0],im1.shape[1]+frames2[j,0]], [frames1[i,1],frames2[j,1]], s=5, c='green') 
    # plot lines
    plt.plot([frames1[i,0],im1.shape[1]+frames2[j,0]],[frames1[i,1],frames2[j,1]],linewidth=0.5)
plt.show()

We see an improvement here since we discarded very ambigious matches. When lowering the threshold even more, we even seem to have perfect matches but not that many.

In [76]:
NN_threshold=0.3

filtered_indices = np.flatnonzero(ratio<NN_threshold)
filtered_matches = matches[filtered_indices,:]
# plot matches
plt.figure()
plt.title("Lowe 2nd NN with threshold 0.3")
plt.imshow(np.concatenate((im1,im2),axis=1))
for idx in range(filtered_matches.shape[0]):
    i=filtered_matches[idx,0]
    j=filtered_matches[idx,1]
    # plot dots at feature positions
    plt.gca().scatter([frames1[i,0],im1.shape[1]+frames2[j,0]], [frames1[i,1],frames2[j,1]], s=5, c='green') 
    # plot lines
    plt.plot([frames1[i,0],im1.shape[1]+frames2[j,0]],[frames1[i,1],frames2[j,1]],linewidth=0.5)
plt.show()
In [77]:
L_th = np.linspace(0,1,21)
numb_matches = [len(np.flatnonzero(ratio<th)) for th in L_th]
plt.title("Evolution of kept matches wrt the threshold | Lowe's 2nd NN test")
plt.xlabel("Threshold")
plt.ylabel("Kept matches")
plt.plot(L_th, numb_matches);

Improving SIFT matching (ii) by geometric verification

In addition to the 2nd NN test, we can also require consistency between the matches and a geometric transformation between the images. For the moment we will look for matches that are consistent with a similarity transformation:

$$\begin{bmatrix} x_2 \\ y_2 \end{bmatrix} = sR(\theta) \begin{bmatrix} x_1 \\ y_1 \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix} $$

which consists of a rotation by $\theta$, an isotropic scaling (i.e. same in all directions) by s, and a translation by a vector $(t_x, t_y)$. This transformation is specified by four parameters $(s,\theta,t_x,t_y)$ and can be computed from a single correspondence between SIFT detections in each image.

:: TASK 1.6 ::

Given a detected feature with parameters $(x_1, y_1, s_1, \theta_1)$ in image $1$ matching a feature $(x_2, y_2, s_2, \theta_2)$ in image $2$, work out how to find out the parameters $(t_x,t_y,s,\theta)$ of the transformation mapping points from image $1$ to image $2$.

$$\theta=\theta_2 - \theta_1$$$$s=\frac{s_2}{s_1}$$$$\begin{bmatrix} t_x \\ t_y \end{bmatrix} =\begin{bmatrix} x_2 \\ y_2 \end{bmatrix} - sR(\theta) \begin{bmatrix} x_1 \\ y_1 \end{bmatrix}$$

The matches consistent with a similarity can then be found using the RANSAC algorithm, described by the following steps:

For each tentative correspondence in turn:

  • compute the similarity transformation;
  • map all the SIFT detections in one image to the other using this transformation;
  • accept matches that are within a threshold distance to the mapped detection (inliers);
  • count the number of accepted matches;
  • optionally, fit a more accurate affine transformation or homography to the accepted matches and test re-validate the matches.

Finally, choose the transformation with the highest count of inliers

After this algorithm the inliers are consistent with the transformation and are retained, and most mismatches should now be removed.

:: TASK 1.7 ::

In [0]:
def ransac(frames1,frames2,matches,N_iters=100,dist_thresh=15):
    # initialize
    max_inliers=0
    tnf=None
    # run random sampling
    for it in range(N_iters):
        # pick a random sample
        i = np.random.randint(0,frames1.shape[0])
        x_1,y_1,s_1,theta_1=frames1[i,:]
        j = matches[i,1]
        x_2,y_2,s_2,theta_2=frames2[j,:]

        # estimate transformation

        # COMPLETE BELOW #

        theta = theta_2 - theta_1 
        s = s_2 / s_1
        t_x = x_2 - s*(np.cos(theta)*x_1-np.sin(theta)*y_1)
        t_y = y_2 - s*(np.sin(theta)*x_1+np.cos(theta)*y_1)

        # evaluate estimated transformation
        X_1 = frames1[:,0]
        Y_1 = frames1[:,1]
        X_2 = frames2[matches[:,1],0]
        Y_2 = frames2[matches[:,1],1]

        X_1_prime = s*(X_1*np.cos(theta)-Y_1*np.sin(theta))+t_x
        Y_1_prime = s*(X_1*np.sin(theta)+Y_1*np.cos(theta))+t_y

        dist = np.sqrt((X_1_prime-X_2)**2+(Y_1_prime-Y_2)**2)

        inliers_indices = np.flatnonzero(dist<dist_thresh)
        num_of_inliers = len(inliers_indices)

        # keep if best
        if num_of_inliers>max_inliers:
            max_inliers=num_of_inliers
            best_inliers_indices = inliers_indices
            tnf = [t_x,t_y,s,theta]

    return (tnf,best_inliers_indices)
In [0]:
tnf,inliers_indices=ransac(frames1,frames2,matches)
filtered_matches = matches[inliers_indices,:]
In [80]:
# plot matches filtered with RANSAC
plt.figure()
plt.imshow(np.concatenate((im1,im2),axis=1))
for idx in range(filtered_matches.shape[0]):
    i=filtered_matches[idx,0]
    j=filtered_matches[idx,1]
    # plot dots at feature positions
    plt.gca().scatter([frames1[i,0],im1.shape[1]+frames2[j,0]], [frames1[i,1],frames2[j,1]], s=5, c='green')
    # plot lines
    plt.plot([frames1[i,0],im1.shape[1]+frames2[j,0]],[frames1[i,1],frames2[j,1]],linewidth=0.5)
plt.show()

As expected, backing up SIFT with geometric verification improves radically the matching in termes of precision/recall peformances

Part 2: Compact descriptors for image retrieval

In large scale retrieval the goal is to match a query image to a large database of images (for example the WWW or Wikipedia).

The quality of an image match is measured as the number of geometrically verified feature correspondences between the query and a database image. While the techniques discussed in Part 1 are sufficient to do this, in practice they require too much memory to store the SIFT descriptors for all the detections in all the database images.

In this part we will see how we can compute a global image descriptor from the set of SIFT descriptors using the bag-of-visual-words (BoVW) approach.

Then, we will see how these global descriptors can be used to rapidly retrieve a shortlist of candidate database images given a query image. Finally, we will see how to re-rank the shortlist of candidates using a geometric verification technique that requires only the detector frames and their assigned visual word indices; remember the SIFT descriptors are only used to compute the compact BoVW descriptors and then discarded.

Download preprocessed dataset of paintings

In [74]:
print('downloading dictionary of SIFT features and precomputed BoVW descriptors for painting dataset...')
!wget -c http://www.di.ens.fr/willow/teaching/recvis/assignment1/paintings_imdb_SIFT_10k_preprocessed.mat
print('done!')
downloading dictionary of SIFT features and precomputed BoVW descriptors for painting dataset...
--2019-10-16 14:51:16--  http://www.di.ens.fr/willow/teaching/recvis/assignment1/paintings_imdb_SIFT_10k_preprocessed.mat
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/willow/teaching/recvis/assignment1/paintings_imdb_SIFT_10k_preprocessed.mat [following]
--2019-10-16 14:51:17--  https://www.di.ens.fr/willow/teaching/recvis/assignment1/paintings_imdb_SIFT_10k_preprocessed.mat
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 105953851 (101M)
Saving to: ‘paintings_imdb_SIFT_10k_preprocessed.mat’

paintings_imdb_SIFT 100%[===================>] 101.04M  2.48MB/s    in 26s     

2019-10-16 14:51:44 (3.83 MB/s) - ‘paintings_imdb_SIFT_10k_preprocessed.mat’ saved [105953851/105953851]

done!

Load preprocessed dataset of paintings

We will now load the preprocessed dataset of paintings. The construction of this dataset has involved several steps.

  1. SIFT features were extracted from all painting in the dataset
  2. A global vocabulary of SIFT descriptors was computed using K-means clustering. These are the visual words of our dataset.
  3. The SIFT features for each painting were assigned to the nearest word, and a compact descriptor was generated for each painting. This compact descriptor consists in the normalized histogram of words. The histogram normalization itself involves 3 different steps:
    • a) TF-IDF weighting: each word is re-weighted according to its TF-IDF value. This removes weights to words that are very common and therefore not too descriptive.
    • b) Square-rooting: each element is square-rooted
    • c) L2-normalization: The whole histogram is L2-normalized.
In [81]:
imdb=loadmat('paintings_imdb_SIFT_10k_preprocessed.mat')

feature_vocab=np.transpose(imdb['vocab'])
imdb_hists=imdb['index']
imdb_tfidf=imdb['idf']

imdb_url = lambda idx: imdb['images'][0][0][3][0][idx].item()

num_words = feature_vocab.shape[0]
num_paintings = imdb_hists.shape[1]

print('The vocabulary of SIFT features contains %s  visual words' % num_words)
print('The dictionary index contains %s histograms corresponding to each painting in the dataset' % num_paintings)
print('The tdf-idf vector has shape '+str(imdb_tfidf.shape))
The vocabulary of SIFT features contains 10000  visual words
The dictionary index contains 1703 histograms corresponding to each painting in the dataset
The tdf-idf vector has shape (10000, 1)
In [82]:
painting = imread('data/queries/mistery-painting1.jpg')

[frames, descrs] = cyvlfeat.sift.sift(rgb2gray(painting), peak_thresh=0.01)

plt.figure()
plt.imshow(painting)
plotframes(frames,linewidth=1)

:: TASK 2.1 ::

Construct a KDTree of the vocabulary for fast NN search. Then use the KDTree to find the closest word in the vocabulary to each descriptor of the query image.

In [0]:
from sklearn.neighbors import KDTree

tree = KDTree(feature_vocab)
_, ind = tree.query(descrs)

:: TASK 2.2 ::

Compute the compact BoVW descriptor of the query image.

In [0]:
query_hist=np.zeros((num_words,1))
his, bins = np.histogram(ind.ravel(),bins=10000, range=(0,10000))
query_hist[:,0] = his
In [0]:
# process histogram
query_hist = query_hist*imdb_tfidf
query_hist = np.sqrt(query_hist)
query_hist = query_hist/np.linalg.norm(query_hist)

:: TASK 2.3 ::

Compute the matching score with each image from the database.

In [0]:
from sklearn.metrics.pairwise import cosine_similarity
scores = np.zeros((1,num_paintings))
scores[0,:] = np.array([cosine_similarity(X = imdb_hists.T, Y=query_hist.T).ravel()])
In [0]:
# sort in descending order
scores_sorted_idx = np.argsort(-scores)
scores_sorted = scores.ravel()[scores_sorted_idx]
In [88]:
# plot top matches
N=10
top_N_idx = scores_sorted_idx.ravel()[:N]
plt.figure()
for i in range(N):
    # download images
    url = imdb_url(top_N_idx[i])      
    img = imread(url, pilmode='RGB')
    # choose subplot
    plt.subplot(int(np.ceil(N/5)),5,i+1)
    # plot
    plt.imshow(img)
    plt.axis('off')
    plt.title('score %1.2f' % scores_sorted.ravel()[i])
In [102]:
url = imdb_url(top_N_idx[0])      
IMG1 = imread(url, pilmode='RGB')
# Computing sift descriptors of the top image
[FRAME1, DESCRS1] = cyvlfeat.sift.sift(rgb2gray(IMG1), peak_thresh=0.01)
# Computing sift descriptors of the query image
padded_painting = np.pad(painting, ((0,IMG1.shape[0] - painting.shape[0]), (0,0), (0,0)), mode='constant')
[frames, descrs] = cyvlfeat.sift.sift(rgb2gray(painting), peak_thresh=0.01)

matches=np.zeros((frames.shape[0],2),dtype=np.int)
matches[:,0]=range(frames.shape[0])
# Computing the matches between the SIFT of the query with the SIFT of TOP 1 iamge
matches[:,1] = np.argmin(sklearn.metrics.pairwise.euclidean_distances(descrs,DESCRS1), axis = 1)
tnf,inliers_indices=ransac(frames,FRAME1,matches)
filtered_matches = matches[inliers_indices,:]
# plot matches filtered with RANSAC
plt.figure(figsize=(15,7))
plt.subplot(1,2,1)
plt.title("Query painting")
plt.imshow(painting)
plt.subplot(1,2,2)
plt.title("RANSAC correspondance between query and TOP 1 result")
plt.imshow(np.concatenate((padded_painting,IMG1),axis=1))
for idx in range(filtered_matches.shape[0]):
    i=filtered_matches[idx,0]
    j=filtered_matches[idx,1]
    # plot dots at feature positions
    plt.gca().scatter([frames[i,0],painting.shape[1]+FRAME1[j,0]], [frames[i,1],FRAME1[j,1]], s=5, c='green')
    # plot lines
    plt.plot([frames[i,0],painting.shape[1]+FRAME1[j,0]],[frames[i,1],FRAME1[j,1]],linewidth=0.5)
plt.show()
In [103]:
url = imdb_url(top_N_idx[1])      
IMG1 = imread(url, pilmode='RGB')
# Computing sift descriptors of the top image
[FRAME1, DESCRS1] = cyvlfeat.sift.sift(rgb2gray(IMG1), peak_thresh=0.01)
# Computing sift descriptors of the query image
padded_painting = np.pad(painting, ((0,IMG1.shape[0] - painting.shape[0]), (0,0), (0,0)), mode='constant')
[frames, descrs] = cyvlfeat.sift.sift(rgb2gray(painting), peak_thresh=0.01)

matches=np.zeros((frames.shape[0],2),dtype=np.int)
matches[:,0]=range(frames.shape[0])
# Computing the matches between the SIFT of the query with the SIFT of TOP 1 iamge
matches[:,1] = np.argmin(sklearn.metrics.pairwise.euclidean_distances(descrs,DESCRS1), axis = 1)
tnf,inliers_indices=ransac(frames,FRAME1,matches)
filtered_matches = matches[inliers_indices,:]
# plot matches filtered with RANSAC
plt.figure(figsize=(15,7))
plt.subplot(1,2,1)
plt.title("Query painting")
plt.imshow(painting)
plt.subplot(1,2,2)
plt.title("RANSAC correspondance between query and TOP 2 result")
plt.imshow(np.concatenate((padded_painting,IMG1),axis=1))
for idx in range(filtered_matches.shape[0]):
    i=filtered_matches[idx,0]
    j=filtered_matches[idx,1]
    # plot dots at feature positions
    plt.gca().scatter([frames[i,0],painting.shape[1]+FRAME1[j,0]], [frames[i,1],FRAME1[j,1]], s=5, c='green')
    # plot lines
    plt.plot([frames[i,0],painting.shape[1]+FRAME1[j,0]],[frames[i,1],FRAME1[j,1]],linewidth=0.5)
plt.show()

AUTHORSHIP STATEMENT

I declare that the preceding work was the sole result of my own effort and that I have not used any code or results from third-parties.

KILANI AL HOUCEINE